Graphical systems and methods for human-in-the-loop machine intelligence

ABSTRACT

Methods and systems are disclosed for creating and linking a series of interfaces configured to display information and receive confirmation of classifications made by a natural language modeling engine to improve organization of a collection of documents into an hierarchical structure. In some embodiments, the interfaces may display to an annotator a plurality of labels of potential classifications for a document as identified by a natural language modeling engine, collect annotated responses from the annotator, aggregate the annotated responses across other annotators, analyze the accuracy of the natural language modeling engine based on the aggregated annotated responses, and predict accuracies of the natural language modeling engine&#39;s classifications of the documents.

CROSS REFERENCES TO RELATED APPLICATIONS

This application claims the benefits of U.S. Provisional Application62/089,736, filed Dec. 9, 2014, and titled, “METHODS AND SYSTEMS FORANNOTATING NATURAL LANGUAGE PROCESSING,” U.S. Provisional Application62/089,742, filed Dec. 9, 2014, and titled, “METHODS AND SYSTEMS FORIMPROVING MACHINE PERFORMANCE IN NATURAL LANGUAGE PROCESSING,” U.S.Provisional Application 62/089,745, filed Dec. 9, 2014, and titled,“METHODS AND SYSTEMS FOR IMPROVING FUNCTIONALITY IN NATURAL LANGUAGEPROCESSING,” and U.S. Provisional Application 62/089,747, filed Dec. 9,2014, and titled, “METHODS AND SYSTEMS FOR SUPPORTING NATURAL LANGUAGEPROCESSING,” the disclosures of which are incorporated herein byreference in their entireties and for all purposes.

This application is also related to U.S. non provisional applications(Attorney Docket No. 1402805.00006_IDB006), titled “METHODS FORGENERATING NATURAL LANGUAGE PROCESSING MODELS,” (Attorney Docket No.1402805.00007_IDB007), titled “ARCHITECTURES FOR NATURAL LANGUAGEPROCESSING,” (Attorney Docket No. 1402805.00012_IDB012), titled“OPTIMIZATION TECHNIQUES FOR ARTIFICIAL INTELLIGENCE,” (Attorney DocketNo. 1402805.00014_IDB014), titled “METHODS AND SYSTEMS FOR IMPROVINGMACHINE LEARNING PERFORMANCE,” (Attorney Docket No.1402805.000015_IDB015), titled “METHODS AND SYSTEMS FOR MODELING COMPLEXTAXONOMIES WITH NATURAL LANGUAGE UNDERSTANDING,” (Attorney Docket No.1402805.00016_IDB016), titled “AN INTELLIGENT SYSTEM THAT DYNAMICALLYIMPROVES ITS KNOWLEDGE AND CODE-BASE FOR NATURAL LANGUAGEUNDERSTANDING,” (Attorney Docket No. 1402805.00017_IDB017), titled “METHODS AND SYSTEMS FOR LANGUAGE-AGNOSTIC MACHINE LEARNING IN NATURALLANGUAGE PROCESSING USING FEATURE EXTRACTION,” (Attorney Docket No.1402805.00018_IDB018), titled “METHODS AND SYSTEMS FOR PROVIDINGUNIVERSAL PORTABILITY IN MACHINE LEARNING,” and (Attorney Docket No.1402805.00019_IDB019), titled “TECHNIQUES FOR COMBINING HUMAN ANDMACHINE LEARNING IN NATURAL LANGUAGE PROCESSING,” each of which arefiled concurrently herewith, and the entire contents and substance ofall of which are hereby incorporated in total by reference in theirentireties and for all purposes.

TECHNICAL FIELD

The subject matter disclosed herein generally relates to creating one ormore interfaces for processing human verification of natural languagemodel accuracies. In some embodiments, a natural language modelingengine displays certain information on interfaces to confirmcategorization, classification, or sorting the natural language modelingengine has performed for a set of documents into a scheme comprisingvarious labels that may be present in the documents. In someembodiments, the interfaces are dynamically linked across multipleinterfaces, such that interaction or verifications on one interface mayadjust the information displayed on a linked interface.

BACKGROUND

Human communications in the digital age provide a deluge of information.Compounding the sheer volume of human communications in thistechnological era is the multitude of formats such human communicationsmay come in, such as public news articles, social media posts, emails,customer feedback comments on a vendor website, or informationcirculated within a closed environment. It is difficult and timeconsuming for a human to integrate or recognize trends within thesevarious communication formats with the volume of content involved ineach, and any broad appreciation for what the creators of suchcommunications may be expressing is therefore delayed. Recognizingtrends or underlying meanings across the vast number of humancommunication sources, and appropriately categorizing uniquenomenclature or slang that may be embedded, cannot be efficientlyobtained without the aid of computer modeling tools.

Artificial intelligence tools can attempt to analyze and classify theseinformation sources through metadata or other identifiers, but cannotefficiently analyze the true meaning of the internal content withoutverification oversight that is time consuming and costly to implement.Efficient verification of natural language modeling of a collection ofdocuments is demanded to ensure such computer modeling tools areaccurately recognizing appropriate categories of human communicationswithin a collection of documents.

BRIEF SUMMARY

Graphic interface systems and methods for enabling human verification ofnatural language modeling by computer analysis tools, such as a naturallanguage modeling engine, are described. In some embodiments, theverification is achieved by presenting a document with a list ofpotential labels or tasks describing the document on a work unitinterface and aggregating the responsive inputs to the work unitinterface (hereinafter, such responsive inputs are referred to as“annotations”). An annotation is not necessarily a confirmation of anatural language modeling engine's prediction of a label or task of adocument. An annotation to a document or subset of a document (referredto as a “span”) generally includes information indicating how thedocument or span should be classified into one or more topics orcategories. In some embodiments, an annotation is a corrective departurefrom the natural language modeling engine's prediction. Aggregatedannotations can, in some embodiments, be displayed on related interfacesand further manipulated to determine the accuracy of a natural languagemodeling engine in categorizing the document into a hierarchicalstructure of labels and/or “tasks” associated with the document.

A “task,” in some embodiments, a can be a clarification, reflection,sentiment, or other objective surrounding a document or label, such as,merely by way of example, “positive” or “negative” or various degreesbetween. “Tasks” may also refer to genres or groupings within labels andnot merely binary interpretations of a label. In this disclosure, thehierarchical structure may also be thought of as a categorization, orclassification, and is hereinafter referred to as an “ontology.”

In annotating a document, the work unit interface may, in someembodiments, highlight or otherwise visually distinguish only a portionof the document (such as by underlining or italicizing), such portion ofa document is referred to as a “span.”

In some embodiments, a collection of documents is accessed through anatural language modeling engine. The natural language modeling enginecan organize the documents into an ontology by analyzing, grouping, andclassifying them based on the words within the documents, and thenatural language modeling engine's logic processing, or use of pastkeywords for classification.

In some embodiments, the ontology is displayed on a first graphic userinterface (GUI). The first GUI can include an option for the user of thefirst GUI, such as a project manager analyzing the documents, to sendselected documents to a series of annotators to confirm the accuracy ofthe label or task for the particular document within the ontology asinitially determined by the natural language modeling engine.

In some embodiments, to facilitate annotation, the document is presentedas part of a work unit interface displaying relevant panes forefficiently verifying the accuracy of the natural language modelingengine's ontology. In some embodiments, a work unit interface isconstructed by integrating a document, a label or task, a guidelinedescribing the label/task or distinguishing it from other labels/tasks,and a human readable prompt soliciting a response from the annotator. Insome embodiments, the document is displayed in a document pane of thework unit interface, and a series of eligible labels or tasks aredisplayed in a label pane adjacent to the document pane. According tovarious embodiments, there may be one or more labels or tasks presentedin the label pane of the work unit interface. In some embodiments, theguideline describing the label or task is displayed on the work unitinterface as a reference button adjacent to its respective label or taskand when activated by a user of the work unit interface (such as by“clicking,” or “pressing” or hovering a cursor over the referencebutton) the reference button opens a guideline pane displaying the labelor task description.

In some embodiments, the human readable prompt is displayed in a promptpane of the work unit interface. In some embodiments, the human readableprompt is generated from an intelligent queuing module of a naturallanguage modeling engine. Depending on the embodiment, the humanreadable prompt requests confirmation of a label or task of the documentdisplayed in the document pane as predicted by the natural languagemodeling engine, or requests the user of the work unit interface toselect the most applicable label or task for the document from among aplurality of displayed labels or tasks in the label pane, and in stillother embodiments requests the user identify all labels or tasks thatare related to the document. One of skill in the art will appreciatemany variations on the human readable prompt.

In some embodiments, the work unit interface is presented on a secondGUI displayed to an expert annotator. The second GUI, when displaying awork unit interface, may include a create pane configured to receive anadditional label or task, or revised guideline for a respective label ortask within the label pane of the work unit interface. When a createlabel/task or create guideline input is received from an expertannotator on the second GUI, the second GUI may update the respectivework unit interface on one or more second or third GUIs also configuredto display the particular work unit interface. Third GUIs, in someembodiments, display the work unit interface to annotators that are notexpert annotators, and such work unit interfaces do not include a createpane.

In some embodiments, updates from second GUIs may change what isdisplayed on work unit interfaces displayed on other second or thirdGUIs, and also affect which work unit interfaces are displayed to otherexpert annotators on a second GUI or to other annotators on a third GUI.In some embodiments, updates to the work unit interface may be thereplacement of a label or task with a new label or task entered into acreate pane. In some embodiments, updates to the work unit interface maybe a new human readable prompt for selection of a label or task (forexample, an expert annotator could direct a work unit interface toprompt “choose a label” rather than “confirm if the presented label isapplicable”). In some embodiments, updates to the work unit interfacemay be to replace an existing guideline with a revised guideline enteredinto a create pane. In some embodiments, updates to the work unitinterface may be supplement the given label(s) or task(s) with the labelor task entered into a create pane. In some embodiments, updates to thework unit interface may be to supplement a given guideline to a label ortask with a revised guideline entered into a create pane.

In some embodiments, the third GUI operated by an annotator, or secondGUI operated by an expert annotator, receives an action on the work unitinterface responsive to the human readable prompt. In some embodiments,the received action is an annotation of a label or a task of thedocument as requested by the human readable prompt. In some embodiments,the annotation is aggregated with all annotations to that particularwork unit interface displayed received all third GUIs and second GUIsthat were presented the work unit interface.

In some embodiments, the aggregated annotations are collected by thenatural language modeling engine and displayed on the first GUI forfurther analysis and interaction by, for example, a project manager forthe collection of documents, though one of skill in the art willenvision other users or suitable roles for operating first GUI tointeract with an aggregation of annotations.

In some embodiments, the aggregated annotations are displayed on thefirst GUI in an annotation agreement interface comprising a series ofinformation panes. In some embodiments, a label feedback pane of theannotation agreement interface displays a plurality of label panes foreach label or task within an ontology and displays the number ofannotations the respective label or task received, and options to deleteor edit the label or task. In some embodiments, the annotation agreementinterface includes a learning curve pane, such learning curve displays agraphical representation of the number of annotations received for aparticular label or task and the agreement among those annotations forthe accuracy of that particular label or task. In some embodiments, theannotation agreement interface includes an annotation feedback pane. Theannotation feedback pane, depending on embodiment, may display anaggregate annotation agreement score representing the overall accuracyof the ontology as determined by the annotation agreements across alllabels or tasks of the ontology. In some embodiments, the annotationfeedback pane includes an individual annotator agreement list displayingthe agreement scores of each individual annotator relative to the otherannotators.

In some embodiments, the annotation feedback pane includes a suggestionpane for collapsing labels or tasks into one another. By collapsingindividual labels or tasks into groups of labels or tasks, theannotation agreement interface can reduce disagreements betweenannotators of a particular label or task, or reduce confusion annotatorsmay have over minute differences between labels or tasks. For example,if the work unit interface displays labels of “securities” and “stock”for a particular document, an annotator may have trouble distinguishingthe two and the respective labels will have a low annotation agreementscore, however, if the two labels were recalculated as a common label,the annotation agreement score may improve. Such collapsed label or taskcalculations may indicate, to a project manager or user of a first GUI,the need to refine the labels or tasks within the ontology, or the needto refine the guidelines describing the labels or tasks.

In some embodiments, the annotation agreement interface includes anagreement per label or task graphical representation, such as a barchart displaying the agreement per label or task relative to the otherlabels or tasks in the ontology that received annotations. In someembodiments, the agreement per label or task graphical representationfurther includes a collapsed agreement per label or task graphicalrepresentation displaying the agreement per label or task if two or morelabels or tasks were collapsed into one another.

In some embodiments, the annotation agreement interface includes a perdocument agreement list displaying information such as which documentsreceived the highest agreement or which documents received the lowestagreement. A user of the first GUI could potentially remove documentswith lower annotation agreement from the ontology to improve theaccuracy of the model much in the same way other analysis methods removeambiguous information or “noise” from those data sets.

The aggregated annotations, either in their original annotation form, oras manipulated through the first GUI by the various pane describedabove, can then be used to display verified analysis of the naturallanguage process engine ontology and indicate trends and sentimentscontained within the collection of documents for that ontology with acertain degree of reliance (the annotation agreement score most readilyserving as a proxy for reliance of the ontology). For example, anontology comprising a collection of thousands of “tweets” from a Twitterhashtag of # Tesla could divide the tweets into labels based on commonwords across the tweets, such as “battery,” “autonomous,” and “ElonMusk” with tasks related to each label such as “positive” or “negative”and display the number of tweets that fall within each label and taskand the number of annotations to each tweet and the annotation agreementamongst annotator to give a fast overview of the general disposition ofthe tweets within the ontology.

These and other embodiments of the present disclosure along with many oftheir advantages and features are described in more detail inconjunction with the text below and attached figures.

BRIEF DESCRIPTION OF THE DRAWINGS

Some embodiments are illustrated by way of example and not limitation inthe figures of the accompanying drawings.

FIG. 1 is a network diagram illustrating an example network environmentsuitable for performing aspects of the present disclosure, according tosome embodiments.

FIG. 2 is an illustration of a system diagram showing of a first,second, and third graphic user interface (GUI) operably coupled to anatural language modeling engine, according to some embodiments.

FIG. 3 illustrates a sample interface display of a first GUI accessing acollection of documents, according to some embodiments.

FIG. 4 illustrates a sample interface display of a first GUI presentinga landing page for a user of a first GUI to access features andmaintenance, information of a collection of documents to a user, andannotation assignments according to some embodiments.

FIG. 5 illustrates a sample interface display of a first GUI foraccessing a plurality of documents to create a collection of documents,according to some embodiments.

FIG. 6 illustrates a sample interface display of a first GUI forreviewing automatic topic modeling incident to an ontology of acollection of documents based on keywords within the collection,according to some embodiments.

FIG. 7A illustrates a sample interface display of a first GUI forcreating an ontology of a collection of documents based on keywordsentered into the interface, according to some embodiments.

FIG. 7B illustrates a sample interface display of a first GUI forvisually presenting topic relevance and relationship to other topics,according to some embodiments.

FIG. 8A illustrates a sample work unit interface arrangement of panesfor collecting annotations from an expert annotator, according to someembodiments.

FIG. 8B illustrates a sample work unit interface arrangement of panesfor collecting annotations from an annotator, according to someembodiments.

FIG. 8C illustrates a sample work unit interface arrangement of panesfor displaying guideline information of a label or task, according tosome embodiments.

FIGS. 8D-E illustrates a sample work unit interface arrangement of panesfor displaying annotating spans and selecting spans, according to someembodiments.

FIG. 9A illustrates a sample interface display of a label feedback panewithin a first GUI for displaying characteristics of a label within anontology, according to some embodiments.

FIG. 9B illustrates a sample interface display of a label feedback panewithin a first GUI for displaying a learning curve of the relationshipbetween annotation agreement as a function of the number of annotationsreceived, according to some embodiments.

FIG. 10A illustrates a sample interface display of an annotationfeedback pane within a first GUI for displaying annotation informationrelative to labels or tasks and annotators, according to someembodiments.

FIG. 10B illustrates a sample interface display of an annotationfeedback pane within a first GUI for displaying graphicalrepresentations of annotation agreements relative to collapsing labelsinto one another, according to some embodiments.

FIG. 10C illustrates a sample interface display of a label feedback panewithin a first GUI for displaying annotation agreements relative tospecific documents within the collection of documents.

FIG. 11 illustrates a sample interface display of a rules feedback panewithin a first GUI for displaying adjustment options to modeling andinterpreting certain labels or tasks, according to some embodiments.

FIG. 12 illustrates an example method for creating and integrating aninterface for collecting and aggregating annotations across GUIs,according to some example embodiments.

FIG. 13 illustrates an example method for building an annotationagreement interface for analyzing annotations of labels or tasks withinan ontology, according to some embodiments.

FIG. 14 illustrates an example method of an interface for collapsinglabels or tasks into one another and displaying comparative annotationagreements, according to some embodiments.

FIG. 15 illustrates an example method of updating a work unit interfacewith revised guidelines, according to some example embodiments.

FIG. 16 illustrates a block diagram illustrating components of amachine, according to some example embodiments, able to readinstructions from a machine-readable medium and perform any one or moreof the methodologies discussed herein.

DETAILED DESCRIPTION

The following detailed description should be read with reference to thedrawings when appropriate, in which identical reference numbers refer tolike elements throughout the different figures. The drawings, which arenot necessarily to scale, depict selective embodiments and are notintended to limit the scope of the invention. The detailed descriptionillustrates by way of example, not by way of limitation, the principlesof the invention. This description will clearly enable one skilled inthe art to make and use the invention, and describes severalembodiments, adaptations, variations, alternatives and uses of theinvention, including what is presently believed to be the best mode ofcarrying out the invention. As used in this specification and theappended claims, the singular forms “a,” “an,” and “the” include pluralreferents unless the context clearly indicates otherwise.

Examples merely demonstrate possible variations. Unless explicitlystated otherwise, components and functions are optional and may becombined or subdivided, and operations may vary in sequence or becombined or subdivided. In the following description, for purposes ofexplanation, numerous specific details are set forth to provide athorough understanding of example embodiments. It will be evident to oneskilled in the art, however, that the present subject matter may bepracticed without these specific details.

Systems, methods, and apparatuses are presented for causing the displayof certain information and efficiently sharing responsive inputs toverify the accuracy of computer modeling in natural language processing.Marketing outlets, internal compliance departments, customer servicedepartments, or even design and engineering industries can greatlybenefit from rapid analysis of communications about products andpolicies. In the digital age, where communication can take many formsand the volume of information within those forms is staggering, analysisof these documents by human review can only be performed in a timelyfashion by choosing a small sample of the communication. Such smallsampling introduces error as to any conclusions, by casting doubt to howrepresentative a sample is of the whole.

Computing modeling, by contrast, can analyze an entire corpus ofdocuments to rapidly identify the general trends and commentary acrossall documents by using artificial intelligence to recognize keywords,syntax, and relation within documents in a much more timely fashion. Bycategorizing a collection of documents into topics, with a series ofdescriptive labels and tasks describing each document, a naturallanguage modeling engine can build an ontology of documentsdemonstrative of overall sentiments and underlying meaning of trendsacross all documents.

To ensure accuracy of the ontology, and confirm the artificialintelligence is appropriately categorizing documents, a series ofinterfaces are described for presenting and receiving human annotationsof a natural language modeling engine's classification.

FIG. 1 illustrates a network for relating users across interfaces forannotating documents that may be classified according to an ontologycreated by a natural language modeling engine. Network 100 includes user132 operating device 130, user 152 operating device 150, and networkbased system 105. User 132 or user 152 may be one of a project manager,expert annotator, or annotator. Device 130 or device 150 may be a mobiledevice, desktop computer, tablet, or other computing device configuredto operate any one of the interfaces described herein. Connecting user132 or user 152 to network based system 105 through device 130 or device150 is network 190. Network 190 may be a wireless network (such as widearea network, local area network), ethernet connections or other wiredconnection, or other suitable network system for linking computingdevices.

Network based system 105 can include a server machine 110 configured toperform natural language modeling according to some embodiments asfurther described in this detailed description, and a database 115.Database 115 may store a collection of documents for server machine 110to access and create an ontology around, or may store artificialintelligence rules for server machine 110 to access and apply to sortinga collection of documents, or may store guidelines and labels or tasksthat have been used in other ontologies that server machine 110 canaccess to build additional ontologies. For example, if a previousontology had been built for “JP Morgan Chase” and documents related totheir investment banking practice, database 115 can store the labels andtasks and guidelines used in that ontology to inform server machine 110which labels and tasks and guideline may be relevant in a subsequent “JPMorgan Chase” ontology relating to, for example purposes only, “customerservice.”

In some embodiments, user 132 or 152 is a customer or other third partyseeking to have a collection of documents analyzed or classified into anontology and transfers a collection of documents to network based system105 through device 130 or 150 via network 190.

In some embodiments, network based system 105 creates an ontology of acollection of documents, and displays a work unit interface on a secondor third GUI to verify the accuracy of the ontology to user 132 or user152 through device 130 or 152. User 130 or 150 annotates the informationin their respective interface, and network based system 105 aggregatesthe annotations to refine the ontology or draw further conclusions aboutthe underlying documents and displays the results on a first GUI, whichmay be user 132 or user 152 depending on the embodiment.

FIG. 2 illustrates a diagram of interface system 200 relating a first,second, and third GUI with a natural language modeling engine 210. Insome embodiments, natural language modeling engine 210 is network basedsystem 105 as described in FIG. 1. In some embodiments, natural languagemodeling engine 210 comprises a series of modules to include a database,which may be the same database as database 115 as described in FIG. 1;and an input/output (I/O) module configured to receive and transmitinformation throughout interface system 200. In some embodiments,natural language modeling engine 210 further comprises an API moduleconfigured to communicate through an I/O module with various devices,GUIs, and operating systems interacting with natural language modelingengine 210. In some embodiments, natural language modeling engine 210further comprises an intelligent queuing module configured to generatehuman readable prompts for populating a work unit interface to elicit anannotation from expert annotators on a second GUI or annotators on athird GUI. In some embodiments, natural language modeling engine 210further comprises an annotation module for constructing a work unitinterface with at least a document from a database, a human readableprompt from an intelligent queuing module, and a label or task from adatabase. In some embodiments, an annotation module is furtherconfigured to aggregate received annotations from across a plurality ofwork unit interfaces and compute agreements and relationships betweenthe aggregated annotations. In some embodiments, natural languagemodeling engine 210 further comprises a modeling module for constructingan ontology from a collection of documents on a database. In someembodiments, a modeling module is configured to modify an ontology inresponse to aggregated annotations received from a plurality of workunit interfaces across second GUIs 214 or third GUIs 216, or themanipulations to an aggregated set of annotations as received from auser of a first GUI 212.

In some embodiments, operably coupled to natural language modelingengine 210, such as by network 190 as described in FIG. 1, are first GUI212 operated by a project manager or similar user role for managing thecollection and using information created by the ontology, second GUI 214operated by an expert annotator, and third GUI 216 operated by anannotator. Though depicted as individually coupling to natural languagemodeling engine 210, each of first GUI 212, second GUI 214 and third GUI216 may, in some embodiments, be directly connected to one another, suchthat an input from second GUI 214 may be transmitted to third GUI 216without intermediary communication with natural language modeling engine210.

In some embodiments, a project manager can, through first GUI 212,access a collection of documents in natural language modeling engine210. The project manager can select certain topics, create topics, orreview a set of keywords for suggested topics provided by naturallanguage modeling engine 210, and the modeling module of naturallanguage modeling engine 210 can then build an ontology for that topicwith a hierarchical structure of labels and tasks of the documentsrelated to that topic.

For example, for a collection of Twitter tweets with the hashtag #Chase, a project manager may want to determine the general dispositionof the tweets as, and if, they relate to Chase banking. In oneembodiment, the project manager can create a topic for the tweetsspecifically with the name “Chase Bank” and the modeling module respondsby identifying and organizing the tweets into an ontology based on thattopic with sub labels and tasks further refining the disposition of thetweets. In some embodiments, certain tweets will be excluded, forexample those relating to baseball player Chase Utley that may otherwisehave the hashtag # Chase in it. In some embodiments, a “relevant” and“irrelevant” label within the ontology will distinguish the # Chasetweets relating to banking or baseball.

In some embodiments, the annotation module of natural language modelingengine 210 creates a work unit interface to display to expert annotatorsoperating second GUI 214 or annotators operating third GUI 216. Asdepicted, interface system 200 may include a plurality of second GUI 214or third GUI 216. In some embodiments, the annotation module selectscertain documents from the ontology displayed on first GUI 212 topopulate a work unit interface. In some embodiments, the documentselected is merely to confirm the accuracy of the label or task of theplacement within the ontology. In some embodiments, the document isselected because the modeling module cannot determine which label ortask of the ontology to document should be applied to based on its ownprocessing rules.

In one embodiment second GUI 214 is distinguished from third GUI 216 byhaving a create option when displaying a work unit interface. The workunit interface is described in further detail in other parts of thisdisclosure. For purposes of describing the role of each interface inrelation to one another; second GUI 214 or third GUI 216 receiveannotations from a respective expert annotator or annotator, and theannotation module of natural language modeling engine 210 aggregates theannotations. First GUI 212 displays the aggregated annotations to aproject manager to indicate the accuracy of the ontology or suggestionsfor categorizing the information to be more accurate.

To more fully describe the capabilities and structures displayed on afirst GUI 212 as described in FIG. 2, FIG. 3 illustrates an exampleembodiment for a control page 300 displayed to a project manager orother user of first GUI 212. In some embodiments, control page 300 isdisplayed after a project manager or similar user has accessed a systemoperating the natural language modeling engine, such as the onedescribed in FIG. 2, with a login credential identifying the user as aproject manager or other similar role for managing collections. In someembodiments, control page 300 displays a collections pane 310 and asharing and permissions pane 320.

Collections pane 310 permits a user of first GUI 212 to create a newproject for analyzing a collection of documents in a natural languagemodeling engine by engaging a create new collection action 312, oraccess previous projects by engaging a view all collections action 314.In some embodiments, sharing and permissions pane 320 permits a user offirst GUI 212 to allow other users, such as user 132 or user 152 toaccess any analytical information of a collection of documents managedby a project manager.

FIG. 4 illustrates an example embodiment of dashboard 400 that ispresented to a user after accessing the natural language modelingengine, such as the one described in FIG. 2. In some embodiments,dashboard 400 directs a user to updates to, and new features providedby, the natural language engine model in a features pane 410 to keepfirst GUI functioning with the latest operational capabilities.Dashboard 400 in some embodiments includes a maintenance pane 412 toidentify system services or other alerts, such as the availability ofthe natural language modeling engine.

In some embodiments, dashboard 400 includes a current projects pane 420.Current projects pane 420 is configured to display a variety ofinformation about projects the user accessing dashboard 400 is managingor has permissions to view, such as model accuracy of a particularproject of a collection of documents, the number of annotationsoutstanding for that particular project, or the topics and labels andtasks of the particular project. Current projects pane 420, in someembodiments, includes a view all projects action button to view moreprojects within current projects pane 420, or a create new topic actionbutton to direct the user to a topic creation series of displaysdescribed more fully below.

In some embodiments, dashboard 400 further includes annotation pane 430for the user of first GUI 212 to access a series of work unit interfacescreated for annotating. In some embodiments, dashboard 400 is displayedon second GUI 214 or third GUI 216 as described in FIG. 2 and displaysannotation pane 430 to those respective users for accessing documentsfor annotation by the respective user. Dashboard 400, in someembodiments, recognizes an expert annotator or annotator from a logincredential and directs the expert annotator or annotator to theannotations pane 430 for annotating documents through their respectivework unit interfaces.

FIG. 5 illustrates a sample display for initiating the creation of anontology from a collection of documents. Collection pane 500 isdisplayed on first GUI 212 and comprises a client pane 510,classification pane 520, and collection data pane 530. In someembodiments, client pane 510 includes a document action button, a topicaction button, and create task button. A document action button permitsa user of first GUI 212 to access the documents within a collectionassociated with a particular client, such as a customer that hasuploaded documents to the natural language modeling engine. A topicaction button permits a user of first GUI 212 to view alternative oradditional topics in a collection to allow additional ontologies forthose topics as necessary or determined by the user of first GUI 212. Acreate task button permits a user of first GUI 212 to add labels ortasks to an ontology in addition to any that may have been created by anatural language modeling engine.

In some embodiments, classification pane 520 is displayed to the user oncollection pane 500 to display a suggested ontology based on previousontologies and collections the user of first GUI 212 has used. In someembodiments, classification pane 520 is populated with other labels andtasks associated with a particular client as identified in client pane510. For example, if a particular client as identified through clientpane 510 by a natural language modeling engine has historically andconsistently used certain labels and tasks for categorizing a collectionof documents, the natural language modeling engine can build a newontology with those historic labels and tasks for a new collection ofdocuments and display the resulting ontology in classification pane 520.

In some embodiments, collection data pane 530 displays documents thatare part of a collection to be analyzed. In some embodiments, and asdepicted in FIG. 5, collection data pane 530 includes an import dataaction button to initiate access to a collection of documents, such as acollection provided by a client or customer, for analysis. In someembodiments, classification pane 520 displays an ontology reactive tothe documents present in collection data pane 530, such that an ontologyis created or previous ontologies are updated as documents accessedthrough collection data pane 530 are interfaced with and annotated.

In some embodiments, collections pane 500 includes a discover topicsaction button 540 to initiate an automatic construction of an ontologythrough a discover topics interface more fully described in conjunctionwith FIG. 6.

FIG. 6 illustrates a collection pane 500 configured to display adiscover topic action button 610. In some embodiments, discover topicaction button 610 prompts a natural language modeling engine to analyzethe collection of documents accessed by first GUI 212, such as throughcollections data pane 530 as described in FIG. 5, for common themes andkeywords. In some embodiments, a first topic suggestion 612 is displayedin collection pane 500 on first GUI 212 with second topic suggestion614, though many other iterations and numbers of topic suggestions arepossible. As an illustrative example, for a collection of documentsaccessed by first GUI 212 there may be a large number of documents withthe word “horse” and a large number of documents with the word“betting.” Engaging discover topic action button 610 can prompt thenatural language modeling engine to form a first topic and ontologyaround the word “horse” and sub labels and tasks such as “breed,”“positive,” “diet” and a second topic and ontology around the word“betting” with sub labels and tasks such as “owner” and “race dates.”

In some embodiments, a topic suggestion includes further displays of thekeywords identified to justify the creation of the topic suggestion. Forexample, as depicted in FIG. 6., keywords pane 615 displays at least onekeyword recognized through a plurality of the documents in a collectionand in some embodiments, a plurality of documents 616 containing thatkeyword is further displayed in collection pane 500. In someembodiments, classification pane 520 is configured to display thesuggested topics 612 and 614 and keywords 615 as an ontology; that is,the suggested topics 612 and keywords 615 can be used in someembodiments as the label and tasks for a collection of documents tocreate an ontology displayed in classification pane 520. A user of firstGUI 212 can review the suggested topics, and select which set ofkeywords identified by the natural language modeling engine best reflectthe needs of the project and thereby select a topic for building anontology to display in classification pane 520.

By presenting the documents associated with a keyword, through aplurality of documents 616, a user of first GUI 212 can see how thekeywords are used to further gauge the context of the keywords and notsimply the presence of the word before choosing a topic and building anontology. For example, using the above “horse” and “betting” topics; ifthe keyword for a document is “race” or “bet,” and plurality ofdocuments 616 displays advertisements for stables with the lines, “Youwill race over to get a stall in our stable,” or “You can bet yourfamily will love our horses,” a user of first GUI 212 can determinethose documents are not truly indicative of racing or betting and removethem from the ontology, or decide to choose another topic that may haveplurality of documents 616 more inline with the desired keywords.

In some embodiments, keywords 615 are identified independently of topicsuggestion 612 or 614. In other words, an ontology is created aroundkeywords 615 without a threshold topic to group those keywords under. Insome embodiments, each keyword within keywords 615 is a label or taskfor an ontology. For example, to use the “horse” and “betting” examplesfrom above, rather than categorizing a collection of documents that mayhave the words “horse” and “betting” within them into distinct topicswith keywords directed to those particular words, a natural languagemodeling engine can create an ontology from only the keywords. In theseembodiments, labels such as “relevant” or “irrelevant” may be moreimportant to distinguish which documents are applicable for a label ortask, as a threshold topic selection may not have filtered thesedocuments.

As depicted in FIG. 7A, in some embodiments, the user of a first GUI 212can create an ontology independently through a create classificationtask pane 720 displayed in collection pane 500 on a first user GUI 212.These embodiments represent additional ways to create an ontologywithout a specific topic selection. In some embodiments, first GUI 212automatically discovers suggested topics, such as suggested topics 612and 614 depicted in FIG. 6, and displays a keywords suggestion pane 710populated with keyword sets identified across those topics. Keywordsuggestion pane 710 can determine keywords much in the same way asdescribed in FIG. 6 by recognizing common words and identifying majorthemes across a collection of documents. From create classification taskpane 720, the user of first GUI 212 has more autonomy in applying thosekeywords to create an ontology. Keywords suggestion pane 710 can allowthe user to quickly assess the correlation among the keyword set, thecontext, presence of slang, and other subjective factors to create anappropriate topic around.

In some embodiments, a topic label pane 730 is further presented incollection pane 500 to receive a user's specific input for the labelsand tasks to a collection of documents. For example, though keywords canbe readily used for labels or tasks, a user of first GUI may havespecific labels in mind for an ontology and can direct a naturallanguage modeling engine to build an ontology on those directed labels.In those embodiments, non-intuitive relationships can be constructedthat artificial intelligence may not yet be programmed for orexperienced enough to identify on its own.

In some embodiments, once the ontology is created, either by the naturallanguage modeling engine discover topics function described above or bythe create classification task function dictated by the user of a firstGUI as described above, certain documents are selected for annotation toconfirm the accuracy of the placement of the document within a label ofthe ontology. In some embodiments, the natural language modeling enginecannot determine which label(s) are applicable to a document and doesnot know where to place a particular document into the ontology andselected the document for annotation. In some embodiments, a naturallanguage modeling engine selects documents for verification ofplacement, despite a high likelihood of successful categorization. Insome embodiments, the natural language modeling engine constructs a workunit interface to efficiently receive annotations for such verificationor placement.

As depicted in FIG. 7B, in some embodiments, keyword suggestion pane 710is a visual presentation rather than a purely textual list of keywords.In some embodiments, keyword suggestion pane 710 is accompanied by topiccircle graph 740 displaying possible topics based on a number ofkeywords supporting a topic. For example, a topic with more keywordsrelated to it is displayed as a larger topic circle. Additionally, intopic circle graph 740 the relation between topics is visuallydisplayed. In some embodiments, topics with more keywords in common withanother topic are displayed closer together on topic circle graph 740.

In some embodiments, relevant terms window 750 displays keywords acrossthe collection of documents. In some embodiments, the frequency of aparticular keyword within a topic selected by a user in topic circlegraph 740 is displayed in relevant terms window 750 with a comparisonfor how frequently that keyword appears in other documents across thecollection. For example, if the word “banking” appears four hundredtimes across a collection of documents, and four hundred times in aparticular topic a user could readily deduce that “banking” is veryrelevant to the collection of documents and that an ontology for thecollection of documents should include the word “banking.”

FIG. 8A depicts an example expert annotator work unit interface 800 aspresented on a second GUI 214 constructed by a natural language modelingengine. In some embodiments, work unit interface 800 comprises adocument pane 810, a prompt pane 820, label pane 830 comprising at leastone label or task, reference button 835 paired to a label or task withinlabel pane 830, create label pane 840 and create guideline pane 845.

In some embodiments, expert annotator work unit interface 800 ispresented on second GUI 214 upon a user logging into a natural languagemodeling engine with an expert login credential. As described in FIG. 4and annotation pane 430, second GUI 214 may display an annotation pane430 directing the user to expert annotator work unit interface 800 forannotations from the user operating second GUI 214. Expert annotatorwork unit interface 800 is constructed by a natural language modelingengine displaying in document pane 810 a document from the ontology tobe annotated, and listing at least one label or task from the ontologyin label pane 830. In some embodiments, natural language modeling engineaccesses a database of guidelines defining or otherwise describing thelabel or task displayed in label pane 830 and pairs the guideline withthe respective label or task. In some embodiments, reference button 835is a link to the guideline paired with a label or task in label pane830. The function of a reference button is further described inconjunction with FIG. 8C.

In some embodiments, the intelligent queuing module of natural languagemodeling engine 210, such as the one described in FIG. 2, generates ahuman readable prompt to elicit a label or task selection by a user ofexpert annotator work unit interface 800. Examples of such humanreadable prompt include, but are not limited to, “select the best labelfor the document” from a plurality of labels in label pane 830, “selectall labels that apply to the document” from a plurality of labels inlabel pane 830, “rank the labels in order of relevance” from a pluralityof labels in label pane 830, or binary response prompts such as “doesthis label apply to the document?” with yes and no labels in label pane830. One of skill in the art can imagine a multitude of applicable humanreadable prompts. The human readable prompts are not necessarilyidentical across all work unit interfaces displaying the same document.In some embodiments, work unit interface 800 displays document 810′ inthe document pane 810 and prompt 820′ in prompt pane 820. In otherembodiments, work unit interface 800 displays document 810′ in thedocument pane 810 and prompt 820″ in prompt pane 820. By using differentprompts for the same document, natural language modeling engine canstill collect annotations from the work unit interface 800 but have amore diverse basis for aggregation information to a label.

In some embodiments, human readable prompt 820 elicits the selection orcategorization of portions of text displayed in document pane 810.Examples of such prompts include, but are not limited to, “select allexamples of each label within the document” with a list of labels inlabel pane 830 and “is the highlighted section of the document anexample of this label?” with yes and no labels in label pane 830. Insome embodiments, work unit interface 800 displays document 810′ indocument pane 810 with a plurality of example regions of text visuallydistinguished from the rest of the document to assist the annotator. Insome embodiments, the example regions of text are created using the APImodule of natural language modeling engine 210. Examples of visualrepresentations for example regions include, but are not limited to,using unique background colors around the example region to highlightthe text, and underlining the example text regions. In some embodiments,the example region is distinguished with variable degrees of visualrepresentation to reflect a natural language modeling engine'sconfidence in selecting an example region as a correct example of alabel or task. For example, in some embodiments, a thicker underlinedexample region indicates stronger confidence as opposed to a thinnerunderlined example region, or an opaque highlighted background color asopposed to a semi-transparent background color.

In some embodiments, prompt pane 820 is populated with the generatedhuman readable prompt. In some embodiments, expert annotator work unitinterface 800 permits the user of second GUI 214 to populate createlabel pane 840 with a new label. Expert annotators creating new labelsfor documents can distinguish certain nuances in documents that normalannotators or a natural language modeling engine cannot, such as legalinterpretations or advanced sciences that may have distinct meanings ina particular field. By entering a new label or task into create labelpane 840, the expert annotator can update other work unit interfaceswith the created labels the expert annotator has identified, and updatean ontology with more accurate categorizations. Similarly, in someembodiments, expert annotator work unit interface 800 includes a createguideline pane 845 permitting the user of second GUI 214 to populatecreate label pane 845 with a revised guideline to pair with a particularlabel to provide more descriptive information to help other expertannotators or other annotators interpret the applicability of a label ortask rather than simply rely on the guideline provided by the naturallanguage modeling engine. In some embodiments, create guideline pane 845receives a “gold” designation from an expert annotator to indicate aparticular label or task is particularly representative or a goodexample otherwise of the document or prompt displayed in a work unitinterface. Such “gold” or similar exemplary marker is displayed in awork unit interface displayed to other expert annotators or annotatorsas a guideline explained more fully as 890 in describing FIG. 8C. Suchdesignations indicate not only that an expert annotator wants to drawattention to a particular label or task, but can also be used to trainother expert annotators or annotators to what the particular label ortask should be representing.

In some embodiments, an annotation of a document is made on expertannotator work unit interface 800 by selecting a label or task displayedin label pane 830 and the annotation is recorded by a natural languagemodeling engine.

In some embodiments, subsequent to selection of a label displayed inlabel pane 830, work unit interface 800 immediately displays anadditional human readable prompt in prompt pane 820 and populates atleast one subsequent label in label pane 830 responsive to the earlierlabel selection of the first human readable prompt. For example, if theexpert annotator answers “Yes” to a first human readable prompt aboutdocument relevance, prompt pane 820 may immediately display anadditional human readable prompt requesting the best label for thedocument. By contrast, if the expert annotator answers “No” to the firstprompt, an additional human readable prompt is not displayed.

In some embodiments, the additional human readable prompt created forprompt pane 820 matches the ontology structure displayed inclassification tasks pane 520. In such embodiments, annotations of adocument are made for all labels and tasks in an ontology by selecting alabel or task in label pane 830 for each additional human readableprompt.

FIG. 8B illustrates an embodiment of an annotator work unit interface850 as displayed on a third GUI 216. In some embodiments, annotator workunit interface 800 is presented on third GUI 216 upon a user logginginto a natural language modeling engine with an annotator logincredential. As described in FIG. 4 and annotation pane 430, third GUI216 may display an annotation pane 430 directing the user to annotatorwork unit interface 850 for annotations from the annotator operatingthird GUI 216. Annotator work unit interface 850 is constructed bydisplaying in document pane 860 a document from the ontology to beannotated, and listing at least one label or task from the ontology inlabel pane 880. In some embodiments, natural language modeling engineaccesses a database of guidelines defining or otherwise describing thelabel or task displayed in label pane 880 and pairs the guideline withthe respective label or task. In some embodiments, reference button 885is a link to the guideline paired with a label or task to present to theannotator on third GUI 216 the definition of the particular paired labelor task in label pane 880. The function of a reference button is furtherdescribed in conjunction with FIG. 8C.

In some embodiments, the intelligent queuing module of natural languagemodeling engine 210, such as the one described in FIG. 2, generates ahuman readable prompt to elicit a label or task selection by a user ofannotator work unit interface 850. Examples of such human readableprompt include, but are not limited to, “select the best label for thedocument” from a plurality of label panes 880, “select all labels thatapply to the document” from a plurality of label panes 880, “rank thelabels in order of relevance” from a plurality of label panes 880, orbinary response prompts such as “does this label apply to the document?”with yes and no label panes 880. One of skill in the art can imagine amultitude of applicable human readable prompts. The description ofprompt pane 820 in describing expert annotator work unit interface 800is applicable in the various prompts possible in describing prompt pane870. In some embodiments, prompt pane 870 of annotator work unitinterface 850 is populated with the generated human readable prompt.

In some embodiments, human readable prompt 870 elicits the selection orcategorization of portions of text displayed in document pane 860.Examples of such prompts include, but are not limited to, “select allexamples of each label within the document” with a list of labels inlabel pane 880 and “is the highlighted section of the document anexample of this label?” with yes and no labels in label pane 880. Insome embodiments, work unit interface 850 displays document 860′ indocument pane 860 with a plurality of example regions of text visuallydistinguished from the rest of the document to assist the annotator. Insome embodiments, the example regions of text are created using the APImodule of natural language modeling engine 210. Examples of visualrepresentations for example regions include, but are not limited to,using unique background colors around the example region to highlightthe text, and underlining the example text regions. In some embodiments,the example region is distinguished with variable degrees of visualrepresentation to reflect a natural language modeling engine'sconfidence in selecting an example region as a correct example of alabel or task. For example, in some embodiments, a thicker underlinedexample region indicates stronger confidence as opposed to a thinnerunderlined example region, or an opaque highlighted background color asopposed to a semi-transparent background color.

In some embodiments, an annotation of a document is made on annotatorwork unit interface 850 by selecting a label or task displayed in atleast one label pane 880 and the annotation is recorded by a naturallanguage modeling engine.

In some embodiments, subsequent to selection of a label displayed inlabel pane 880, work unit interface 850 immediately displays anadditional human readable prompt in prompt pane 870 and populates atleast one subsequent label in label pane 880 responsive to the earlierlabel selection of the first human readable prompt. For example, if theannotator answers “Yes” to a first human readable prompt about documentrelevance, prompt pane 870 may immediately display an additional humanreadable prompt requesting the best label for the document. By contrast,if the annotator answers “No” to the first prompt, an additional humanreadable prompt is not displayed.

In some embodiments, the additional human readable prompt created forprompt pane 870 matches the ontology structure displayed inclassification tasks pane 520. In such embodiments, annotations of adocument are made for all labels and tasks in an ontology by selecting alabel or task in label pane 880 for each additional human readableprompt.

FIG. 8C illustrates a reference button function on either an expertannotator work unit interface 800 or annotator work unit interface 850and the respective reference button 835 or reference button 885. Uponselection by a user of second GUI 214 or third GUI 216 of referencebutton 835 or reference button 885 respectively, guideline 890 isdisplayed within the work unit interface. In some embodiments, theguideline 890 displayed is the guideline accessed from a database of anatural language modeling engine for a particular label; in someembodiments the guideline 890 displayed is the revised guideline ascreated by an expert annotator through expert annotation work unitinterface 800 create guideline pane 845 on second GUI 214, and updatedon all work unit interfaces present on GUIs within an interface system200 such as one depicted in FIG. 2. In some embodiments, guideline 890displays both the guideline accessed from a database of natural languagemodeling engine as well as the revised guideline, if any, created by theexpert annotator.

In some embodiments, the annotations received on all work unitinterfaces 800 and 850 are aggregated together by ontology the documentsunderlying the work unit interface were drawn from. In some embodiments,the aggregation occurs in an annotation module of a natural languagemodeling engine 210 and are shared on a first GUI 212 such as depictedin FIG. 2.

FIG. 8D illustrates a span annotation work unit interface interaction.In some embodiments, a work unit interface displays a document in adocument pane and span prompt 891 in a prompt pane. In some embodiments,span prompt 891 is associated with span annotation 892 in a label pane.In some embodiments, a natural language modeling engine predicts spansof a document that represent a particular label or task and displays thepredicted span as a highlighted text or underlined text 893 in thedocument displayed in the document pane of the work unit interface. Oneof skill in the art can appreciate other ways of visually distinguishingtext for span annotation in a work unit interface. In some embodiments,an expert annotator or annotator annotates the span prediction with aspan annotation 892 in the label pane.

In some embodiments, a span prompt 891 requests confirmation of multiplespans, such as “location” and “person” and a toggle or swatch or menufunction in the document pane switches between the prompted spans anddisplays different highlighted or underlined text 893 corresponding tothe span prompted in span prompt 891. For example, a span promptrequests confirmation of a document that the highlighted or underlinedtext represents “people” and “locations.” An expert annotator orannotator selects a toggle, swatch, or menu function in the documentpane for “locations” and the work unit interface displays thosehighlighted or underlined texts the natural language modeling engine haspredicted correspond to “locations.” The expert annotator or annotatorthen annotates with a span annotation 892 in the label pane and thenpresses the toggle, swatch, or menu function for “people” and the workunit interface displays those highlighted or underlined texts thenatural language modeling engine has predicted correspond to “people.”The expert annotator or annotator then annotates for “people” and thenatural language modeling engine processes the span annotations.

FIG. 8E illustrate a span selection work unit interface. In someembodiments, span selection prompt 894 is displayed in a prompt pane ofa work unit interface requesting an annotator highlight or underline orotherwise visually distinguish a span within a document pane. In someembodiments, an expert annotator or annotator annotates an otherwiseunmarked document in a document pane by highlighting or otherwisevisually distinguishing a span 895 within the document. In someembodiments, a natural language modeling engine predicts which spancorresponds to requested label or task in span prompt 894 by visuallydistinguishing a span in a separate method than an expert annotator orannotator would. For example, for a span selection prompt 894 requestinga user select a span for “locations,” a natural language modeling enginewill instruct a work unit interface to present underlined spans 896 thenatural language modeling engine predicts correspond to “locations,” andan expert annotator or annotator can confirm the prediction byhighlighting the underlined span or highlighting other spans to annotatethe document.

FIG. 9A illustrates an annotation agreement interface 900 display on afirst GUI 212 with a label feedback pane 910, annotation feedback pane920, and rules pane 930. As further illustrated in FIG. 9A, in someembodiments, label feedback pane 910 includes a by label description ofeach label within an ontology with the number of annotations applied tothe label and an option to delete or edit the label. From label feedbackpane 910, a user of first GUI 212 can remove a label from an ontology ifthe annotation agreement for that label is low or the user of first GUI212 determines it is not applicable to the ontology, or edit the labelsuch as by reviewing a create label action as provided through a secondGUI 214 and determining that label more applicable or descriptive.Similarly, the user of first GUI 212 can determine not enoughannotations have been applied to the label to draw any conclusions fromand decide to wait before making any adjustments to that label.

FIG. 9B illustrates an example of a learning curve 915 within labelfeedback pane 910. In some embodiments, learning curve 915 is agraphical representation of the relationship between the number ofannotations received for a particular label and the agreement betweenthe annotations for the label. In some embodiments, learning curve 915is a graphical representation of the relationship between the number ofannotations received for a particular label and the accuracy of thenatural language model generated for that label. Annotation agreement toa label, in some embodiments, is calculated by an annotations module ofa natural language modeling engine, such as natural language modelingengine 210 as depicted in FIG. 2. In some embodiments, the annotationagreement is a number indicating the incidence rate of mutually agreedannotations among all annotators operating second GUIs 214 or third GUIs216.

For example purposes only of one way to calculate an annotationagreement, if 10 annotators all annotated a document with a label of“positive” and 10 annotators did not annotate the same documents as“positive,” then an annotation agreement of 0.50 or 50% would bereflected for the “positive” label or task of the document and learningcurve 915 of those labels or tasks would depict the 0.50 or 50%agreement for 20 annotations. In the same example, if the next 20annotators gave a “positive” annotation to the same document, theannotation agreement would update to 75% for a “positive” annotation for40 forty annotations and learning curve 915 would graphically depictthis relationship between annotation agreement as a function of thenumber annotations. One of skill in the art can appreciate otherannotation agreement calculation methods.

In some embodiments, the accuracy of the natural language modelingengine in assigning documents to appropriate labels or tasks of anontology is derived from cross-validation processes of the annotationsused in a learning curve. In some embodiments, a modeling module of anatural language modeling engine, such as natural language modelingengine 210 as depicted in FIG. 2, performs cross-validation on theannotation dataset to determine ontology accuracy. One of skill in theart can appreciate applicable cross-validation techniques to apply to anannotation dataset such as exhaustive or non-exhaustive methods.

FIG. 10A illustrates an example display of an annotation feedback pane920 with annotation agreement score 1010, individual annotator agreementlist 1030, and suggested label collapse list 1020. In some embodiments,annotation agreement score 1010 displays an aggregate annotationagreement result as a proxy for the accuracy of an ontology. Forexample, in some embodiments, the annotation agreement score 1010 isdetermined by aggregating all annotations from second or third GUIs 214or 216 for a particular set of documents that have been categorized intoan ontology. Such an aggregation, in some embodiments, reflect the totalannotation agreement of the whole ontology and give a project manager orother user of a first GUI 212 a rapid feedback mechanism of how well thenatural language modeling engine categorized a collection of documentsbased on how human readers agreed a particular label or task applied tothe same collection of documents. Such feedback confirmation can informa project manager or other user of a first GUI 212 the degree ofagreement among annotators on whether the labels or tasks presented on awork unit interface accurately reflected the document displayed on thework unit interface. For example, a low agreement score would indicatethat the labels or tasks may have been too vague or inapplicable, andthe annotators could not agree on whether, or which of, the labelspresented in a label pane of applied to the document of the work unitinterface and therefore, perhaps the natural language modeling enginedid not have the most appropriate series of labels or tasks tocategorize the ontology. By contrast, a higher agreement score couldindicate the annotators found at least one label of a label pane was anaccurate match to the document based on the generated prompt, andtherefore the annotators found at least one the labels readily appliedand the natural language modeling engine accurately captured the labelor task of at least some documents.

In some embodiments, annotation agreement score 1010 can be broken downinto a per label agreement, and in suggested label collapse list 1020indicate which labels or tasks introduced higher disagreement amongannotator and display the annotation agreement score that would resultif certain labels or tasks were collapsed—or combined—with each other.For example, as depicted in FIG. 10A, the suggested label collapse list1020 displays the resulting annotation agreement score 1010 ofcollapsing certain labels or tasks into one of several other labels ortasks of a hypothetical ontology (e.g. collapsing “Legal” with“Securities issues” and “JPMC Financial” with “Other”). In this exampleembodiment of FIG. 10A, collapsing “Legal” with “Other” results in anannotation agreement score 1010 of 0.773 as compared to an annotationagreement score 1010 of 0.741 if the two were separate labels or tasks.As this represents the largest increase annotation agreement score 1010within the suggested label collapse list 1020, a user of first GUI 212can readily deduce that annotators had a greater difficultydistinguishing “Legal” from “Other” than they did distinguishing anyother two labels or tasks within the ontology. A user of first GUI 212can make several other deductions from this information, such as whetherthe guideline describing “Legal” or “Other” sufficiently describes thelabel or task, or whether the prompt should be changed to permit morenuanced distinctions, or whether the ontology itself should not includea particular label or task.

In some embodiments an individual annotator agreement list 1030 displayshow well a particular annotator within the aggregation of annotationsagrees with other annotators. For example, as displayed in FIG. 10A, theannotator identified as “demo4” has an agreement value of 0.601, thelowest of the annotator group depicted in FIG. 10A, indicating thatdemo4 has a low incidence of agreeing with the other annotators (i.e.“demo1,” “demo2,” and “demo3”) on a label or task as prompted in a workunit interface. As depicted in FIG. 10A, annotator demo4 also has onlyannotated 100 documents compared to the other annotators' 550. Thisdisplay of information could suggest that annotator demo4 may need to beremoved from the annotation group (for example, the subject matterconfuses that annotator) to compute an annotation agreement score 1010that reflects annotations with less annotators that may not understandthe subject matter and could be giving false positives on applicablelabels to a natural language modeling engine. This information couldalso be used to indicate annotator demo4 needs to be retrained on aparticular subject matter depending on the degree of disagreement. Forexample, if annotator demo4 selected labels of “slightly positive” ascompared to other annotators selecting “positive” then a user of GUI 212could decide to retrain annotator demo4. However, if annotator demo4selected labels of “very negative” as compared to other annotatorsselecting “very positive” then a user of GUI 212 could decide to removeannotator demo4 from the analysis. In other instances, a user of firstGUI 212 could simply decide to wait for demo4 to annotate more documentsto see if demo4′s agreement value increases; such information canfurther be used to determine whether or not labels should be collapsedor if a project needs further annotation before drawing conclusions onthe natural language modeling engine's accuracy.

FIG. 10B illustrates an agreement per label graphical representation1040, and collapsed agreement per label graphical representation 1045.In some embodiments, agreement per label graphical representation 1040displays individual annotation agreements by each label of an ontology,that, taken all together would comprise annotation agreement score 1010as depicted in FIG. 10A. Breaking down the annotation agreements into aper label graphical representation informs a project manager or user offirst GUI 212 which labels or tasks, relative to others, had the mostagreement and can inform whether the label or task is likely anappropriate reflection of documents in a work unit interface.

In some embodiments, the annotation agreement interface includes acollapsed agreement per label graphical representation 1045 configuredto display the per label annotation agreements if two or more labelswere collapsed into one another. As illustrated for example purposes inFIG. 10B, collapsed agreement per label graphical representation 1045indicates that by combining two labels with a respective agreement of0.613 and 0.274 from agreement per label graphical representation 1040into one another, a new agreement score of 0.59 results for thatcombined label in collapsed agreement per label graphical representation1045 suggesting the annotation agreement score 1010 as illustrated inFIG. 10A will improve if these two labels are combined into one another,and enabling similar deductions by a project manager or user of firstGUI 212 as described in connection with those functions of FIG. 10A.

FIG. 10C illustrates a per document agreement list 1050 within anannotation feedback pane 920. In some embodiments, a per documentagreement list 1050 displays those documents within the ontology thathave the highest agreement or lowest agreement among them. In someembodiments, the number of documents displayed in per document agreementlist 1050 can be adjusted by a user of first GUI 212. In someembodiments, the per document agreement list 1050 displays thosedocuments within a subset of label or task of an ontology with thehighest or lowest agreement among annotators. By flagging the documentswith the highest or lowest agreement in per document agreement list1050, a user of first GUI 212 can choose to remove certain documentswith low agreement to reduce the number of potentially vague orinapplicable documents within a collection (as indicated by not havingstrong human agreement on the applicable labels or tasks), or review thecontent of the particular document and label or task of a work unitinterface displaying the document to determine whether the documentcontains a nuance that should be included as a new label or whether aguideline should be clarified to account for such nuances.

As depicted in FIG. 11, an annotation agreement interface can furtherdisplay a rules pane 930 for adjusting any of the natural languagemodeling engine logic processes for certain inputs. In some embodiments,the rules pane comprises a phrase pane 1122 configured to display a datafield for receiving a phrase or word that, if found within a document bya natural language modeling engine, will invoke a rule. In someembodiments, rules pane 930 displays a weighting adjustment pane 1124.In some embodiments, weighting adjustment pane 1124 is configured toreceive from a user of first GUI 212 a manipulation to a certain phraseor word in phrase pane 1122 to emphasize or de-emphasize a certain wordin placing a document in a label or task category of an ontology. Forexample, in an ontology with the word “recommend” as a label or taskclassification, a natural language modeling engine may categorizeincidence of the word “recommend” as equivalent to “recommendation”without recognizing the context of the complete phrase or document“recommend” appears in, such as “does not recommend” which would notimply an positive recommendation. Weighting pane 1124 permits a user offirst GUI 212 to reduce or increase the significance of certain words,thereby placing greater or less emphasis on other words in the documentrelative to the word or phrase in phase pane 1122. Such weighting canfocus the ontology on which labels or tasks to create, or whichdocuments should be selected for annotation. Continuing from theprevious example, if a document includes the word “recommend” but thatword has a low weighting and therefore the user of first GUI 212 doesnot consider it important, the natural language modeling engine may notselect it for annotation to avoid using an annotator's time toaccurately place a document with a low weighting.

In some embodiments, rules pane 930 includes add rule pane 1126. Addrule pane 1126 permits a user of first GUI 212, or in some embodimentsan expert annotator operating second GUI 214, to create a rule for aparticular phrase or word in phrase pane 1122. For example, if the word“recommend” appears in a document, add rule pane 1126 could bemanipulated by receiving in a data field of add rule pane 1126 a rulesuch as “if this then that” logic rules, or rules to search foradditional words surrounding a word in a phrase pane 1122. To continuethe previous example, add rule pane 1126 could receive a rule to searchfor preceding words such as “no,” “does not,” “isn't,” or other similarnegative implicative words such that if “recommend” is paired with sucha negative implicative word, the natural language modeling engine willnot categorize the document as an affirmative “recommendation.”

Taken together, a label feedback pane 910, annotation feedback pane 920,and rules pane 930 of an annotation agreement interface of a first GUI212 permits rapid analysis of an ontology that has been annotatedthrough second and third GUIs 214 and 216. Annotation agreementinterface 900 further provides access to a variety of tools to determinewhere an ontology and its attendant labels or tasks can be refined tomore accurately determine the underlying meaning of a collection ofdocuments or store information for future ontologies to learn from. Forexample, if a revised guideline for a label results in an improvedannotation agreement score 1010 of an ontology, the natural languagemodeling engine can store that revised guideline in a database and usedthat guideline for future ontologies that use the same label or task therevised guideline is describing. The wealth of information anddeductions possible from human annotations to computer analysis toolsand the insights such annotations provide can greatly improve massclassification of human communications.

FIG. 12 illustrates an example of process 1200 for verifying theaccuracy of a natural language modeling engine's creation of an ontologyof a collection of documents by aggregating human annotations across aseries of GUIs. Process 1200 starts at 1210 with accessing a series ofinputs through a first GUI, such inputs being those associated withbeing made by a project manager or similar supervisory role to acollection of documents.

In some embodiments, accessing inputs at 1210 includes accessing atleast one document at 1212, such document provided by a third party. Athird party access source, in some embodiments, is a customer thatprovides a collection of documents to be analyzed; in some embodiments,third party access source is a database of a collection of documentssuch as the database 115 as depicted in FIG. 1. In some embodiments,accessing inputs 1210 further includes accessing at least one firstlabel associated with the document at 1214. In some embodiments, thefirst label is accessed from an ontology created around the document asbuilt from a natural language modeling engine. In some embodiments,accessing inputs 1210 further includes accessing a plurality of firstguidelines describing the first label at 1216. In some embodiments, thefirst guideline is sourced from a database of guidelines, such as oneoperated by a natural language modeling engine, that are associated witha list of labels and each first guideline is a description or definitionof the label or task.

In some embodiments, at 1220 a second label or second guideline isaccessed, such as from a second GUI 214 like the one operated by anexpert annotator described in FIG. 2. Collectively, the access of inputsat 1210 and second guidelines and second labels at 1220 provide aplurality of information components for the construction of a work unitinterface to permit human annotation to selected documents.

At 1230, in some embodiments, a work unit interface is built. In someembodiments, building the work unit interface at 1230 involves assigning(which depending on embodiment can mean “populating” or “placing”) thedocument accessed at 1212 to a document pane of the work unit interfaceat 1231. In some embodiments, at 1232 labels are assigned to, andpopulated in, a label pane of a work unit interface. In someembodiments, the labels assigned to the label pane at 1232 are the firstlabels accessed at 1214 from an ontology of a collection of documentsthrough a first GUI. In some embodiments, the labels or tasks assignedto the label pane at 1232 are the second labels accessed at 1220 from asecond GUI. In some embodiments, the label pane is assigned with aplurality of labels, and in still other embodiments, the plurality oflabels assigned the label pane at 1232 includes both first labels andsecond labels.

At 1233, a human readable prompt is generated to elicit a response froma human annotator that requests a task of the document. In someembodiments, the human readable prompt is generated by an intelligentqueuing module of a natural language modeling engine. In someembodiments, the human readable prompt is a question requestingselection of the most applicable label or task assigned in the labelpane at 1232 for the document assigned in the document pane at 1231. Insome embodiments, the human readable prompts requests selection of allapplicable labels or tasks assigned in the label pane of a work unitinterface at 1232. One having skill in the art can envision additionalhuman readable prompts requesting a task of a document. At 1234, thegenerated human readable prompt is assigned to a prompt pane of the workunit interface.

At 1235, a single guideline from one of the first guideline accessed at1216 or second guideline at 1220 is paired with a single label or taskassigned to the label pane of the work unit interface at 1232. In someembodiments several labels or tasks are assigned to the label pane at1232, and several single guidelines are paired with a single one of theseveral labels or tasks at 1235. In some embodiments, after pairing thesingle guideline with a single label at 1235, a reference button iscreated for the single guideline at 1236. A reference button permitsaccess to the full textual description of the single label paired withthe single label without requiring display of the single guideline. Insome embodiments, the reference button is placed adjacent to the singlelabel paired with the single button in the label pane at 1237.

In some embodiments, process 1200 continues at 1240 by displaying thebuilt work unit interface to an annotator operating a second GUI orthird GUI. In some embodiments, the annotator operating the second GUIis an expert annotator. In some embodiments, the annotator operating thethird GUI is an annotator. At 1250, at least one annotation is receivedthrough the work unit interface from among the second or third GUIs. At1260, the annotations received at 1250 are aggregated together.

In some embodiments, process 1300 provides a method of creatinginterfaces to efficiently manage and manipulate annotated documents toverify and draw conclusions as to the accuracy of a natural languagemodeling engine. Method 1300 begins at 1310, and in some embodimentsstep 1310 is a subsequent to step 1260 as described in FIG. 12. At 1310,an annotation agreement interface is built.

In some embodiments, building an annotation agreement interface at 1310includes a series of substeps 1312, 1314, 1316, and/or 1318. In someembodiments, at 1312 a label feedback pane is created. A label feedbackpane includes a plurality of labels or tasks of an ontology from acollection of documents, and in some embodiments further includes adescription of the label or task, an indicator of the number ofannotations, or action buttons to edit or delete the label or task fromthe ontology. Editing, in some embodiments, may include removingannotations from the label or task, or applying a new guidelinedescribing the label or task.

In some embodiments, at 1314 a learning curve pane is created. In someembodiments, at 1314 the learning curve pane displays an aggregation ofannotations from among GUIs in a network, such as network 200 asdepicted in FIG. 2 with a plurality of second GUIs 214 or third GUIs216. In some embodiments, at 1314 the learning curve pane displays agraphical representation of the relationship between the number ofannotations received and the agreement between those annotations. Insome embodiments, at 1314 the learning curve pane displays a graphicalrepresentation of the relationship between the number of annotationsreceived for a particular label and the accuracy of the natural languagemodel generated for that label.

In some embodiments, the learning curve displayed at 1314 is a learningcurve for an entire collection. In some embodiments, the learning curvedisplayed at 1314 is a learning curve for a particular label or task. Bydisplaying a learning curve, a project manager or user of first GUI 212can determine whether the annotations are beginning to smooth out and/orapproach a consistent agreement regardless of additional annotations orwhether the number of additional annotations introduces continuedvariability (which would be represented as a staggered line in alearning curve) of agreement. With such information, a project manageror user of first GUI 212 can allocate annotators efficiently, such as byceasing to request annotations for a particular label or task orcollection of documents if it is apparent that additional annotationswill not appreciably affect the agreement. Or, in instances with a highdegree of variability with additional annotations a project manager oruser of first GUI 212 could assign additional annotators to provide morehuman oversight to attempt to reach a consistent agreement level.

In some embodiments, method 1300 continues to 1316 by creating anannotation feedback pane. An annotation feedback pane can be configuredto display a plurality of data. In some embodiments, an annotationfeedback pane displays an annotation agreement score. An annotationagreement score displays the overall agreement between annotators of thewhole collection of documents within the ontology being analyzed, andcan indicate the general disposition or accuracy of the entire ontologyand whether further by-label or by-task analysis or manipulation iswarranted given the overall agreement score or whether annotators shouldretrained on the definition of particular categories.

In some embodiments, an annotation feedback pane displays at 1316 anindividual annotator agreement list with the agreement scores and numberof annotations per individual annotator within the collection. Such anindividual annotator agreement list indicates whether certain annotatorshave processed all documents presented them on work unit interfaces oftheir respective GUIs, or whether certain annotators appear to havetrouble with the subject matter is indicated by low agreement scoresrelative to other annotators. For example, if an overall agreement, suchas one displayed in an annotation agreement score, were 0.7 withindividual annotator agreement list displaying individual agreements of0.75, 0.55, 0.7, and 0.77, a project manager could determine that theindividual annotator with the 0.55 score should be dropped from theannotation agreement score.

In some embodiments, at 1316 a suggested label collapse list is created.A suggested label collapse list enables a project manager or user of afirst GUI to quickly identify the strengths and weaknesses of anontology and a natural language modeling engine's ability to sortdocuments into a particular label or task based on the document'scontent. In some embodiments a suggested label collapse list is createdby pairing annotated labels or tasks together and displaying how theannotation agreement score would be affected if the two labels or taskswere combined into a common label or task; in other words, how anannotation agreement score would be affected if annotators were notrequired to distinguish between certain labels or tasks. Such a featurecan indicate whether the natural language modeling engine selectedappropriate labels or tasks, or whether guidelines paired with labels ortasks are sufficiently describing the label or task.

In some embodiments, at 1316 an agreement per label graphicalrepresentation is created by displaying a bar graph of agreement amongannotators for one or more given labels or tasks of the ontology beingannotated. In some embodiments, a collapsed agreement per labelgraphical representation is created by determining the agreement perlabel or task if certain labels of tasks were collapsed into oneanother. Such a feature further indicates which labels or tasks withinan ontology are more likely to be correctly placed on an ontology bycreating an interface for side by side comparison of agreements iflabels or tasks were asked to be distinguished from one another or ifthey were combined.

In some embodiments, at 1316 a per document agreement list is created byidentifying those documents with the highest and lowest annotationagreements. In some embodiments, the number of documents displayed in aper document agreement list is determined from an input by a projectmanager or user of first GUI 212. A per document agreement list allows aproject manager or user of a first GUI 212 to determine which documentsgave annotators the most difficulty in agreeing on a common label ortask, as well as showing which documents have unanimous agreement andmake the best exemplars of a category. A project manager or user offirst GUI can then review certain documents to determine whetheranything particular is giving trouble to annotators, or even remove thedocument from the ontology for its lack of a clear disposition.

In some embodiments, method 1300 continues at 1318 to create a rulespane for receiving a phrase in a phrase pane, a weighting adjustment ina weighting adjustment pane, or a rule in an add rule pane. In someembodiments, a phrase pane is configured to identify a certain phrasewithin a document. In some embodiments, a weighting adjustment pane isconfigured to reduce the relevance of a phrase in the phrase panerelative to other words or phrases in the document such that a naturallanguage modeling engine will emphasize or deemphasize certain phrases.In some embodiments, an add rule pane gives direction for a naturallanguage modeling engine to perform on a phrase in a phrase pane. Forexample, if the natural language modeling engine recognizing a phrasefrom the phrase pane in a document, a rule in the add rule pane candictate how to process that document, such as by placing it in aspecific label or task of an ontology, or looking for additional wordsor phrases before placing the document within an ontology.

In some embodiments, method 1300 continues at 1320 by computing initialannotation agreements from aggregated annotations such as at 1260 asdepicted in FIG. 12. In some embodiments, computing initial agreementsis performed by an annotations module, such as one of natural languagemodeling engine 210 depicted in FIG. 2. Computing initial annotationagreements includes not only determining annotation agreement scoresamong annotators for individual labels or tasks and overall score for anontology, but also includes in some embodiments computing a learningcurve relationship between annotation agreement and number ofannotations. In some embodiments at 1320, computing initial annotationagreements determines annotation agreement scores if two or more labelsare collapsed into one another. In some embodiments, computing initialannotation agreements computes per label annotation agreements and perlabel annotation agreements if two or more labels are collapsed into oneanother and graphical relationships of each. In some embodiments,computing initial annotation agreements at 1320 computes the annotationagreement per each document classified according to an ontology, andidentifies the documents with the highest and lowest annotationagreements.

In some embodiments, the computed metrics of 1320 are populated into anannotation agreement interface at 1330. In some embodiments, a labelfeedback pane is populated with each label of an ontology. In someembodiments, at 1330 the label feedback pane is populated with adescription of the label as provided by a natural language modelingengine or project manager of first GUI 212 or expert annotator operatinga second GUI 214. In some embodiments, at 1330 the label feedback paneis populated with the number of annotations a label or task has receivedfrom annotators operating second GUI 214 or third GUI 216.

In some embodiments, at 1330 the learning curve pane is populated withthe computed learning curve metric of the annotation agreement relatedto the number of annotations, as computed at 1320. In some embodiments,at 1330 the annotation feedback pane is populated with the annotationagreement score for the ontology. In some embodiments, at 1330 theannotation feedback pane of the annotation agreement interface ispopulated with a suggested label collapse list for the resultantannotation agreement scores corresponding to how the annotationagreement score will change if two or more labels or tasks are collapsedinto one another.

In some embodiments, at 1330 the annotation feedback pane of theannotation agreement interface is populated with the per label or taskannotation agreements and collapsed agreement per label or task ascomputed at 1320. In some embodiments, at 1330 a per document agreementis populated in an annotation feedback pane.

In some embodiments, at 1340 the panes of the annotation interfacepopulated at 1330 are displayed to a project manager or user of firstGUI 212. With a fully populated annotation agreement interface displayedto such a project manager or user of first GUI 212, method 1300 enablesfollow on actions to manipulate or refine the documents and annotatorsto improve the ontology, such as by enabling improved annotations onother work unit interfaces, or removing documents from the ontology.

As depicted in FIG. 14, method 1400 is a method for interacting withinformation displayed to a project manager or user of first GUI 212 onan annotation agreement interface. In some embodiments, method 1400begins at 1410 by receiving a request to collapse at least one firstlabel or first task into at least one second label or second task. Sucha request to collapse in some embodiments is made, in some embodiments,through the agreement per label graphical representation of theannotation feedback pane. In some embodiments, the request to collapseat 1410 is made through the suggested label collapse list. One of skillin the art can envision numerous ways to select which labels or tasks tocollapse into one another through one of the information displays of anannotation agreement interface.

In some embodiments, at 1430 a subsequent annotation agreement isdisplayed in the annotation feedback pane of the annotation agreementinterface after collapsing the labels or tasks in 1410. Such asubsequent annotation agreement can be used as a visual comparison ofthe annotation agreements before and after collapsing labels or tasks.

FIG. 15 illustrates method 1500 for changes to a work unit interface aproject manager or user of first GUI 212 can make in response toinformation populated on an annotation agreement interface such as bymethod 1300 described in FIG. 13. In some embodiments, method 1500 isdirected to replacing a guideline for a label in a work unit interface.In some embodiments, a project manager or user of first GUI 212 decidesto replace a guideline to a label or task in a work unit interface, suchas by realizing a per label annotation agreement computed at 1320 inmethod 1300 is lower relative to other per label annotation agreements.In some embodiments, an expert annotator decides to replace a guidelineto a label or task after viewing a paired guideline in a work unitinterface from a second GUI and enters a new guideline in the createguideline data field presented on a second GUI such as the createguideline pane 845 as depicted in FIG. 8A.

In some embodiments, method 1500 begins at 1510 by accessing a revisedguideline through a first GUI 212 or second GUI 214. In someembodiments, at 1520 the single guideline initially paired with thelabel or task of the work unit interface (such as at 1235 in FIG.12) isunpaired with the respective label or task. In some embodiments, at1530, the revised guideline accessed at 1510 is then paired with thelabel or task that had the single guideline unpaired at 1520. At 1540, asecond reference button is created for the newly paired revisedguideline with the label or task, and at 1550 the second referencebutton is displayed on the work unit interface. In some embodiments, therevised guideline is included in addition to the single guideline, suchthat when an annotator interacting with the work unit interface pressesthe reference button, both the single guideline and the revisedguideline are displayed to the annotator.

It should be appreciated that the specific steps illustrated in FIGS.12-15 provide a particular process and sequence of interaction amongGUIs and the manipulation of the annotations generated on an ontology.Other sequences of steps may also be performed according to alternativeembodiments. For example, alternative embodiments may perform the stepsoutlined above in a different order. Moreover, the individual sequenceillustrated in FIGS. 12-15 may include multiple sub-sequences asappropriate to the individual step or direct sequences between differentnodes than as illustrated. Furthermore, additional steps may be added orremoved depending on the particular applications. One of skill in theart would recognize many variations, modifications, and alternatives.

Referring to FIG. 16, the block diagram illustrates components of amachine 1600, according to some example embodiments, able to readinstructions 1624 from a machine-readable medium 1622 (e.g., anon-transitory machine-readable medium, a machine-readable storagemedium, a computer-readable storage medium, or any suitable combinationthereof) and perform any one or more of the methodologies discussedherein, in whole or in part. Specifically, FIG. 16 shows the machine1600 in the example form of a computer system (e.g., a computer) withinwhich the instructions 1624 (e.g., software, a program, an application,an applet, an app, or other executable code) for causing the machine1600 to perform any one or more of the methodologies discussed hereinmay be executed, in whole or in part.

In alternative embodiments, the machine 1600 operates as a standalonedevice or may be connected (e.g., networked) to other machines. In anetworked deployment, the machine 1600 may operate in the capacity of aserver machine 110 or a client machine in a server-client networkenvironment, or as a peer machine in a distributed (e.g., peer-to-peer)network environment. The machine 1600 may include hardware, software, orcombinations thereof, and may, as example, be a server computer, aclient computer, a personal computer (PC), a tablet computer, a laptopcomputer, a netbook, a cellular telephone, a smartphone, a set-top box(STB), a personal digital assistant (PDA), a web appliance, a networkrouter, a network switch, a network bridge, or any machine capable ofexecuting the instructions 1624, sequentially or otherwise, that specifyactions to be taken by that machine. Further, while only a singlemachine 1600 is illustrated, the term “machine” shall also be taken toinclude any collection of machines that individually or jointly executethe instructions 1624 to perform all or part of any one or more of themethodologies discussed herein.

The machine 1600 includes a processor 1602 (e.g., a central processingunit (CPU), a graphics processing unit (GPU), a digital signal processor(DSP), an application specific integrated circuit (ASIC), aradio-frequency integrated circuit (RFIC), or any suitable combinationthereof), a main memory 1604, and a static memory 1606, which areconfigured to communicate with each other via a bus 1608. The processor1602 may contain microcircuits that are configurable, temporarily orpermanently, by some or all of the instructions 1624 such that theprocessor 1602 is configurable to perform any one or more of themethodologies described herein, in whole or in part. For example, a setof one or more microcircuits of the processor 1602 may be configurableto execute one or more modules (e.g., software modules) describedherein.

The machine 1600 may further include an input and output module 1610(e.g., a plasma display panel (PDP), a light emitting diode (LED)display, a liquid crystal display (LCD), a projector, a cathode ray tube(CRT), or any other display capable of displaying graphics or video)configured to display any one of the interfaces described herein. Themachine 1600 may also include an alphanumeric input device 1612 (e.g., akeyboard or keypad), a cursor control device 1614 (e.g., a mouse, atouchpad, a trackball, a joystick, a motion sensor, an eye trackingdevice, or other pointing instrument), a storage unit 1616, a signalgeneration device 1618 (e.g., a sound card, an amplifier, a speaker, aheadphone jack, or any suitable combination thereof), and a networkinterface device 1620.

The storage unit 1616 includes the machine-readable medium 1622 (e.g., atangible and non-transitory machine-readable storage medium) on whichare stored the instructions 1624 embodying any one or more of themethodologies, functions, or interfaces described herein, including, forexample, any of the descriptions of FIGS. 1-15. The instructions 1624may also reside, completely or at least partially, within the mainmemory 1604, within the processor 1602 (e.g., within the processor'scache memory), or both, before or during execution thereof by themachine 1600. The instructions 1624 may also reside in the static memory1606.

Accordingly, the main memory 1604 and the processor 1602 may beconsidered machine-readable media 1622 (e.g., tangible andnon-transitory machine-readable media). The instructions 1624 may betransmitted or received over a network 1626 via the network interfacedevice 1620. For example, the network interface device 1620 maycommunicate the instructions 1624 using any one or more transferprotocols (e.g., HTTP). The machine 1600 may also represent examplemeans for performing any of the functions described herein, includingthe processes described in FIGS. 1-15.

In some example embodiments, the machine 1600 may be a portablecomputing device, such as a smart phone or tablet computer, and have oneor more additional input components (e.g., sensors or gauges) (notshown). Examples of such input components include an image inputcomponent (e.g., one or more cameras), an audio input component (e.g., amicrophone), a direction input component (e.g., a compass), a locationinput component (e.g., a GPS receiver), an orientation component (e.g.,a gyroscope), a motion detection component (e.g., one or moreaccelerometers), an altitude detection component (e.g., an altimeter),and a gas detection component (e.g., a gas sensor). Inputs harvested byany one or more of these input components may be accessible andavailable for use by any of the modules described herein.

As used herein, the term “memory” refers to a machine-readable medium1622 able to store data temporarily or permanently and may be taken toinclude, but not be limited to, random-access memory (RAM), read-onlymemory (ROM), buffer memory, flash memory, and cache memory. While themachine-readable medium 1622 is shown in an example embodiment to be asingle medium, the term “machine-readable medium” should be taken toinclude a single medium or multiple media (e.g., a centralized ordistributed database 115, or associated caches and servers) able tostore instructions 1624. The term “machine-readable medium” shall alsobe taken to include any medium, or combination of multiple media, thatis capable of storing the instructions 1624 for execution by the machine1600, such that the instructions 1624, when executed by one or moreprocessors of the machine 1600 (e.g., processor 1602), cause the machine1600 to perform any one or more of the methodologies described herein,in whole or in part. Accordingly, a “machine-readable medium” refers toa single storage apparatus or device 130 or 150, as well as cloud-basedstorage systems or storage networks that include multiple storageapparatus or devices 130 or 150. The term “machine-readable medium”shall accordingly be taken to include, but not be limited to, one ormore tangible (e.g., non-transitory) data repositories in the form of asolid-state memory, an optical medium, a magnetic medium, or anysuitable combination thereof.

Furthermore, the machine-readable medium 1622 is non-transitory in thatit does not embody a propagating signal. However, labeling the tangiblemachine-readable medium 1622 as “non-transitory” should not be construedto mean that the medium is incapable of movement; the medium should beconsidered as being transportable from one physical location to another.Additionally, since the machine-readable medium 1622 is tangible, themedium may be considered to be a machine-readable device.

Throughout this specification, plural instances may implementcomponents, operations, or structures described as a single instance.Although individual operations of one or more methods are illustratedand described as separate operations, one or more of the individualoperations may be performed concurrently, and nothing requires that theoperations be performed in the order illustrated. Structures andfunctionality presented as separate components in example configurationsmay be implemented as a combined structure or component. Similarly,structures and functionality presented as a single component may beimplemented as separate components. These and other variations,modifications, additions, and improvements fall within the scope of thesubject matter herein.

Certain embodiments are described herein as including logic or a numberof components, modules, or mechanisms. Modules may constitute softwaremodules (e.g., code stored or otherwise embodied on a machine-readablemedium 1622 or in a transmission medium), hardware modules, or anysuitable combination thereof. A “hardware module” is a tangible (e.g.,non-transitory) unit capable of performing certain operations and may beconfigured or arranged in a certain physical manner. In various exampleembodiments, one or more computer systems (e.g., a standalone computersystem, a client computer system, or a server computer system) or one ormore hardware modules of a computer system (e.g., a processor 1602 or agroup of processors 1602) may be configured by software (e.g., anapplication or application portion) as a hardware module that operatesto perform certain operations as described herein.

In some embodiments, a hardware module may be implemented mechanically,electronically, or any suitable combination thereof. For example, ahardware module may include dedicated circuitry or logic that ispermanently configured to perform certain operations. For example, ahardware module may be a special-purpose processor, such as a fieldprogrammable gate array (FPGA) or an ASIC. A hardware module may alsoinclude programmable logic or circuitry that is temporarily configuredby software to perform certain operations. For example, a hardwaremodule may include software encompassed within a general-purposeprocessor 1602 or other programmable processor 1602. It will beappreciated that the decision to implement a hardware modulemechanically, in dedicated and permanently configured circuitry, or intemporarily configured circuitry (e.g., configured by software) may bedriven by cost and time considerations.

Hardware modules can provide information to, and receive informationfrom, other hardware modules. Accordingly, the described hardwaremodules may be regarded as being communicatively coupled. Where multiplehardware modules exist contemporaneously, communications may be achievedthrough signal transmission (e.g., over appropriate circuits and buses1608) between or among two or more of the hardware modules. Inembodiments in which multiple hardware modules are configured orinstantiated at different times, communications between such hardwaremodules may be achieved, for example, through the storage and retrievalof information in memory structures to which the multiple hardwaremodules have access. For example, one hardware module may perform anoperation and store the output of that operation in a memory device towhich it is communicatively coupled. A further hardware module may then,at a later time, access the memory device to retrieve and process thestored output. Hardware modules may also initiate communications withinput or output devices, and can operate on a resource (e.g., acollection of information).

The various operations of example methods described herein may beperformed, at least partially, by one or more processors 1602 that aretemporarily configured (e.g., by software) or permanently configured toperform the relevant operations. Whether temporarily or permanentlyconfigured, such processors 1602 may constitute processor-implementedmodules that operate to perform one or more operations or functionsdescribed herein. As used herein, “processor-implemented module” refersto a hardware module implemented using one or more processors 1602.

Similarly, the methods described herein may be at least partiallyprocessor-implemented, a processor 1602 being an example of hardware.For example, at least some of the operations of a method may beperformed by one or more processors 1602 or processor-implementedmodules. As used herein, “processor-implemented module” refers to ahardware module in which the hardware includes one or more processors1602. Moreover, the one or more processors 1602 may also operate tosupport performance of the relevant operations in a “cloud computing”environment or as a “software as a service” (SaaS). For example, atleast some of the operations may be performed by a group of computers(as examples of machines 1600 including processors 1602), with theseoperations being accessible via a network 1626 (e.g., the Internet) andvia one or more appropriate interfaces (e.g., an application programinterface or “API”).

The performance of certain operations may be distributed among the oneor more processors 1602, not only residing within a single machine 1600,but deployed across a number of machines 1600. In some exampleembodiments, the one or more processors 1602 or processor-implementedmodules may be located in a single geographic location (e.g., within ahome environment, an office environment, or a server farm). In otherexample embodiments, the one or more processors 1602 orprocessor-implemented modules may be distributed across a number ofgeographic locations.

Unless specifically stated otherwise, discussions herein using wordssuch as “processing,” “computing,” “calculating,” “determining,”“presenting,” “displaying,” or the like may refer to actions orprocesses of a natural language modeling engine 210 (e.g., a on acomputing device or external server such as server machine 110 depictedin FIG. 1 or as part of a system of interconnected interfaces asdepicted in FIG. 2) that manipulates or transforms data represented asphysical (e.g., electronic, magnetic, or optical) quantities within oneor more memories (e.g., volatile memory, non-volatile memory, or anysuitable combination thereof), registers, or other machine componentsthat receive, store, transmit, or display information. Furthermore,unless specifically stated otherwise, the terms “a” or “an” are hereinused, as is common in patent documents, to include one or more than oneinstance. Finally, as used herein, the conjunction “or” refers to anon-exclusive “or,” unless specifically stated otherwise.

The present disclosure is illustrative and not limiting. Furthermodifications will be apparent to one skilled in the art in light ofthis disclosure and are intended to fall within the scope of theappended claims.

What is claimed is:
 1. A method comprising: accessing first inputsmanaged through a first graphic user interface operated by a projectmanager, wherein accessing first inputs comprises: accessing at leastone document sourced by a third party user; accessing at least one firstlabel associated with the at least one document, wherein the at leastone first label is selected from a classification ontology built from acollection comprising the at least one document; and accessing aplurality of first guidelines describing at least one first label;accessing a second input from a second graphic user interface operatedby an expert annotator, wherein the second input is at least one of thegroup comprising at least one second label and at least one secondguideline; constructing a work unit interface, wherein constructing awork unit interface comprises: assigning a document among the at leastone document to a document pane within the work unit interface;assigning the at least one first label or the at least one second labelto a label pane within the work unit interface; generating a humanreadable prompt requesting a task of the document, wherein the task is aconfirmation of the accuracy for classification of a label for thedocument within the classification ontology; assigning the generatedhuman readable prompt to a prompt pane within the work unit interface;pairing a single first guideline from the plurality of first guidelinesor a single second guideline with a single label of the plurality offirst labels or single second label; creating a reference button foreach single guideline paired with a single label; and placing thereference button adjacent to the paired label within the label pane ofthe work unit interface; displaying to at least one annotator operatinga third graphic user interface or at least one expert annotatoroperating the second graphic user interface the work unit interfacecomprising each of the document, the human readable prompt, the at leastone first label or the at least one second label, and the referencebutton for each single guideline; receiving at least one annotation onthe work unit interface displayed on the third graphic user interface orthe second graphic user interface; and aggregating the at least oneannotation received on the first graphic user interface operated by theproject manager.
 2. The method of claim 1, wherein aggregating the atleast one annotation further comprises: constructing an annotationagreement interface, wherein constructing an annotation agreementinterface further comprises: creating a label feedback pane, wherein thelabel feedback pane is configured to display at least one from the groupcomprising a plurality of label panes and each label pane comprising adescription of a label, an indicator of the number of annotateddocuments associated with the label, a button to delete the label fromthe classification ontology, and a button to edit the label; creating alearning curve pane, wherein the learning curve pane is configured todisplay a graphical representation of the relationship among a number ofannotations aggregated on the first graphic user interface, an agreementamong annotators of a label or task of a document, and an accuracy ofthe classification ontology; creating an annotation feedback pane,wherein the annotation feedback pane is configured to display at leastone of the group comprising a collection annotation agreement score, anindividual annotator agreement list, a suggested label collapse list, anagreement per label graphical representation, a collapsed agreement perlabel graphical representation, and a per document agreement list; andcreating a rules pane, wherein the rules pane is configured to displayat least one of the group comprising a phrase pane, a weightingadjustment pane, and an add rule pane.
 3. The method of claim 2, furthercomprising computing a plurality of initial annotation agreements fromthe aggregated annotations.
 4. The method of claim 3, further comprisingpopulating the annotation agreement interface's label feedback pane,learning curve pane, and annotation feedback pane with computed initialannotation agreements.
 5. The method of claim 4, further comprisingdisplaying the plurality of initial annotation agreements on theannotation agreement interface of the first graphic user interface. 6.The method of claim 5, further comprising: receiving a request throughthe annotation agreement interface of the first graphic user interfaceto collapse at least one first label or task into at least one secondlabel or task; preparing at least one subsequent annotation agreement ascomputed from the initial annotation agreement of the first label ortask and second label or task collapsed into one another; and displayingon the annotation agreement interface of the first graphic userinterface the at least one subsequent annotation agreement among theplurality of initial annotation agreements.
 7. The method of claim 6,further comprising: accessing at least one revised guideline through thefirst graphic user interface or second graphic user interface; unpairingat least one single guideline paired with a single label on the workunit interface; pairing the at least one revised guideline with anunpaired single label on the work unit interface; creating a secondreference button for the at least one paired revised guideline; anddisplaying the second reference button within the work unit interface'slabel pane adjacent to the single label.
 8. A non-transitory computerreadable medium comprising instructions that, when executed by aprocessor, cause the processor to perform operations comprising:accessing first inputs managed through a first graphic user interfaceoperated by a project manager, wherein accessing first inputs comprises;accessing at least one document sourced by a third party user; accessingat least one first label associated with the at least one document,wherein the at least one first label is selected from a classificationontology built from a collection comprising the at least one document;and accessing a plurality of first guidelines describing at least onefirst label; accessing a second input from a second graphic userinterface operated by an expert annotator, wherein the second input isat least one of the group comprising at least one second label and atleast one second guideline; constructing a work unit interface, whereinconstructing a work unit interface comprises: assigning a document amongthe at least one document to a document pane within the work unitinterface; assigning the at least one first label or the at least onesecond label to a label pane within the work unit interface; generatinga human readable prompt requesting a task of the document, wherein thetask is a confirmation of the accuracy for classification of a label forthe document within the classification ontology; assigning the generatedhuman readable prompt to a prompt pane within the work unit interface;pairing a single first guideline from the plurality of first guidelinesor a single second guideline with a single label of the plurality offirst labels or single second label; creating a reference button foreach single guideline paired with a single label; and placing thereference button adjacent to the paired label within the label pane ofthe work unit interface; displaying to at least one annotator operatinga third graphic user interface or at least one expert annotatoroperating the second graphic user interface the work unit interfacecomprising each of the document, the human readable prompt, the at leastone first label or the at least one second label, and the referencebutton for each single guideline; receiving at least one annotation onthe work unit interface displayed on the third graphic user interface orthe second graphic user interface; and aggregating the at least oneannotation received on the first graphic user interface operated by theproject manager.
 9. The computer readable medium of claim 8, wherein theoperations to aggregate the at least one annotation further comprise:constructing an annotation agreement interface, wherein constructing anannotation agreement interface further comprises: creating a labelfeedback pane, wherein the label feedback pane is configured to displayat least one from the group comprising a plurality of label panes andeach label pane comprising a description of a label, an indicator of thenumber of annotated documents associated with the label, a button todelete the label from the classification ontology, and a button to editthe label; creating a learning curve pane, wherein the learning curvepane is configured to display a graphical representation of therelationship among a number of annotations aggregated on the firstgraphic user interface, an agreement among annotators of a label or taskof a document, and an accuracy of the classification ontology; creatingan annotation feedback pane, wherein the annotation feedback pane isconfigured to display at least one of the group comprising a collectionannotation agreement score, an individual annotator agreement list, asuggested label collapse list, an agreement per label graphicalrepresentation, a collapsed agreement per label graphicalrepresentation, and a per document agreement list; and creating a rulespane, wherein the rules pane is configured to display at least one ofthe group comprising a phrase pane, a weighting adjustment pane, and anadd rule pane.
 10. The computer readable medium of claim 9, wherein theoperations further comprise: computing a plurality of initial annotationagreements from the aggregated annotations.
 11. The computer readablemedium of claim 10, wherein the operations further comprise: populatingthe annotation agreement interface's label feedback pane, learning curvepane, and annotation feedback pane with computed initial annotationagreements.
 12. The computer readable medium of claim 11, wherein theoperations further comprise: displaying the plurality of initialannotation agreements on the annotation agreement interface of the firstgraphic user interface.
 13. The computer readable medium of claim 12,wherein the operations further comprise: receiving a request through theannotation agreement interface of the first graphic user interface tocollapse at least one first label or task into at least one second labelor task; preparing at least one subsequent annotation agreement ascomputed from the initial annotation agreement of the first label ortask and second label or task collapsed into one another; and displayingon the annotation agreement interface of the first graphic userinterface the at least one subsequent annotation agreement among theplurality of initial annotation agreements.
 14. The computer readablemedium of claim 13, wherein the operations further comprise: accessingat least one revised guideline through the first graphic user interfaceor second graphic user interface; unpairing at least one singleguideline paired with a single label on the work unit interface; pairingthe at least one revised guideline with an unpaired single label on thework unit interface; creating a second reference button for the at leastone paired revised guideline; and displaying the second reference buttonwithin the work unit interface's label pane adjacent to the singlelabel.
 15. An interface integration system comprising: a data processor;an input and output module from at least one of the group comprising afirst graphic user interface associated with a project manager, a secondgraphic user interface operated by an expert annotator, and a thirdgraphic user interface operated by an annotator; a natural languagemodeling engine operably coupled to the input and output module,configured to execute instructions received from the data processor to:access first inputs managed by the first graphic user interface, whereinthe access to first inputs further comprises; access at least onedocument sourced by a third party user; access at least one first labelassociated with the at least one document, wherein the at least onefirst label is selected from a classification ontology built from acollection comprising the at least one document; and access a pluralityof first guidelines describing at least one first label; access a secondinput from a second graphic user interface, wherein the second input isat least one of the group comprising at least one second label and atleast one second guideline; construct a work unit interface, wherein toconstruct a work unit interface the natural language modeling engine isfurther configured to: assign a document among the at least one documentto a document pane within the work unit interface; assign the at leastone first label or the at least one second label to a label pane withinthe work unit interface; generate a human readable prompt requesting atask of the document, wherein the task is a confirmation of the accuracyfor classification of a label for the document within the classificationontology; assign the generated human readable prompt to a prompt panewithin the work unit interface; pair a single first guideline from theplurality of first guidelines or a single second guideline with a singlelabel of the plurality of first labels or single second label; create areference button for each single guideline paired with a single label;and place the reference button adjacent to the paired label within thelabel pane of the work unit interface; display to at least one annotatoroperating a third graphic user interface or at least one expertannotator operating the second graphic user interface the work unitinterface comprising each of the document, the human readable prompt,the at least one first label or the at least one second label, and thereference button for each single guideline; receive at least oneannotation on the work unit interface displayed on the third graphicuser interface; and aggregate the at least one annotation in anannotation agreement interface of the first graphic user interfaceassociated with a project manager.
 16. The interface integration systemof claim 15, wherein the natural language modeling engine operablycoupled to the input and output module is further configured to executeinstructions received from the data processor to: construct anannotation agreement interface, wherein the instructions to construct anannotation agreement interface further comprises instructions to: createa label feedback pane, wherein the instructions to create a labelfeedback pane configure the display of at least one from the groupcomprising a plurality of label panes and each label pane comprising adescription of a label, an indicator of the number of annotateddocuments associated with the label, a button to delete the label fromthe classification ontology, and a button to edit the label; create alearning curve pane, wherein the instructions to create a learning curvepane configure the display of a graphical representation of therelationship among a number of annotations aggregated on the firstgraphic user interface, an agreement among annotators of a label or taskof a document, and an accuracy of the classification ontology; create anannotation feedback pane, wherein the instructions to create anannotation feedback pane configured the display of at least one of thegroup comprising a collection annotation agreement score, an individualannotator agreement list, a suggested label collapse list, an agreementper label graphical representation, a collapsed agreement per labelgraphical representation, and a per document agreement list; and createa rules pane, wherein the instructions to create a rules pane configurethe display of at least one of the group comprising a phrase pane, aweighting adjustment pane, and an add rule pane.
 17. The interfaceintegration system of claim 16, wherein the natural language modelingengine operably coupled to the input and output module is furtherconfigured to execute instructions received from the data processor to:compute a plurality of initial annotation agreements from the aggregatedannotations; and populate the annotation agreement interface's labelfeedback pane, learning curve pane, and annotation feedback pane. 18.The interface integration system of claim 17, wherein the naturallanguage modeling engine operably coupled to the input and output moduleis further configured to execute instructions from the data processor todisplay the plurality of initial annotation agreements on the annotationagreement interface of the first graphic user interface.
 19. Theinterface integration system of claim 18, wherein the natural languagemodeling engine operably coupled to the input and output module isfurther configured to execute instructions received from the dataprocessor to: receive a request through the annotation agreementinterface of the first graphic user interface to collapse at least onefirst label or task into at least one second label or task; prepare atleast one subsequent annotation agreement as computed from the initialannotation agreement of the first label or task and second label or taskcollapsed into one another; and display on the annotation agreementinterface of the first graphic user interface the at least onesubsequent annotation agreement among the plurality of initialannotation agreements.
 20. The interface integration system of claim 19,wherein the natural language modeling engine operably coupled to theinput and output module is further configured to execute instructionsreceived from the data processor to: access at least one revisedguideline through the first graphic user interface or second graphicuser interface; unpair at least one single guideline paired with asingle label on the work unit interface; pair the at least one revisedguideline with an unpaired single label on the work unit interface;create a second reference button for the at least one paired revisedguideline; and display the second reference button within the work unitinterface's label pane adjacent to the single label.